Skip to content

Conversation

@lmasroca
Copy link
Collaborator

@lmasroca lmasroca commented Oct 7, 2025

Added support for short hexadecimal escapes (\x00..\xff) and unicode escapes (\u0000..\uffff) for Java and JavaScript regular expressions.

@jgaleotti jgaleotti requested a review from arcuri82 October 7, 2025 20:25
Copy link
Collaborator

@arcuri82 arcuri82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmasroca @jgaleotti thx for this PR! but i m a bit confused about EOF... not saying it is wrong, but i don't understand why it was needed to be added, and what possible side effects it could have

// Parser rules have first letter in lower-case

pattern : disjunction;
pattern : disjunction EOF;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this EOF?
how would it work when dealing with strings that don't have it?

// Parser rules have first letter in lower-case

pattern : disjunction;
pattern : disjunction EOF;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comment


val gene = RegexGene("regex", disjList,"${RegexGene.JAVA_REGEX_PREFIX}$text")
// we remove the <EOF> token from end of the string to store as sourceRegex
val gene = RegexGene("regex", disjList,"${RegexGene.JAVA_REGEX_PREFIX}${text.substring(0,text.length - EOF_TOKEN.length)}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the text does not have EOF?


val gene = RegexGene("regex", disjList,"${RegexGene.JAVA_REGEX_PREFIX}$text")
// we remove the <EOF> token from end of the string to store as sourceRegex
val gene = RegexGene("regex", disjList,"${RegexGene.JAVA_REGEX_PREFIX}${text.substring(0, text.length - EOF_TOKEN.length)}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comment

@lmasroca
Copy link
Collaborator Author

@lmasroca @jgaleotti thx for this PR! but i m a bit confused about EOF... not saying it is wrong, but i don't understand why it was needed to be added, and what possible side effects it could have

By default, ANTLR4 tries to match as much input as possible according to the grammar rules. Without EOF, it may stop parsing after the longest valid match and silently ignore the rest. Adding EOF forces it to consume the entire input, which helps detect leftover or invalid tokens. This was needed for tests that intentionally feed invalid input. Regarding side effects, inputs containing invalid/unsupported input would now cause an exception instead of silently dropping part of the input. https://github.com/antlr/antlr4/blob/master/doc/parser-rules.md#start-rules-and-eof

@arcuri82 arcuri82 changed the base branch from master to external-pr-lmasroca October 16, 2025 08:37
@arcuri82 arcuri82 merged commit ab428c8 into WebFuzzing:external-pr-lmasroca Oct 16, 2025
@arcuri82
Copy link
Collaborator

merged into #1349 to be able to run CI on it

arcuri82 added a commit that referenced this pull request Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants